Data Structure and Import

Given are 1344 ETFs from the US. As shown below, the data consists of the Variables Date, Open, High, Low, Close, Volume and OpenInt. The Variable Price has been added for every Date by taking the mean of the High and Low for every date.

##          Date   Open   High    Low  Close Volume OpenInt   Price
## 1: 2010-07-21 24.333 24.333 23.946 23.946  43321       0 24.1395
## 2: 2010-07-22 24.644 24.644 24.362 24.487  18031       0 24.5030
## 3: 2010-07-23 24.759 24.759 24.314 24.507   8897       0 24.5365
## 4: 2010-07-26 24.624 24.624 24.449 24.595  19443       0 24.5365
## 5: 2010-07-27 24.477 24.517 24.431 24.517   8456       0 24.4740
## 6: 2010-07-28 24.477 24.517 24.352 24.431   4967       0 24.4345

The following calculations have been made on all ETFs with 1000 or more data points. This is the case for 1092 ETFs.
I calculated the Overall returns simply as the percent that the ETF has grown from the first observation until the last.

overall_return <- function(x){
  o.return <- (x[which.max(Date)]$Price - x[which.min(Date)]$Price)/x[which.min(Date)]$Price
  o.return
  }

overall_returns <- sapply(etfs.red, function(x) overall_return(x))

Volatility Measures

“In finance, volatility is the degree of variation of a trading price series over time.” Wiki

Historic Volatility

A very common measure of volatility is the Historic Volatility (\(=HV\)).
It is defined as: \[\begin{aligned} HV &= sd(R) \\ R &= ln(\frac{V_f}{V_i}) \end{aligned}\]

\(R =\) logarithmic return
\(V_i =\) price when market closed on day i (I used the mean price for day i instead) \(V_f =\) price when market closed of the next day (I used the mean price for the next day instead)

#Logarithmic or continuously compounded return
get_log_return <- function(x){
  nextP <- c(0, x$Price[1:length(x$Price)-1])
  x[, log_return := log(nextP/Price)]
}

get_abs_return <- function(x){
  nextP <- c(0, x$Price[1:length(x$Price)-1])
  x[, abs_return := nextP-Price]
}

etfs.red <- lapply(etfs.red, get_log_return)
etfs.red <- lapply(etfs.red, get_abs_return)

#Historic Volatility
hist_vol <- sapply(etfs.red, function(x) sd(x$log_return[-1]))

Fractal Dimension

The fractal dimension can be thought of a measure of roughness for a given geometric object (including curves). To calculate the fractal dimension for finencial price charts, various methods have been suggested. Some of them can get quite complicated. I choosed a rather simple one by John Ehlers (Original publication). The Formula to calculate the fractal dimension \(D\) is as follows:

\[\begin{aligned} D &= \frac{Log(HL1 + HL2) - Log(HL)}{Log(2)} \\ HL1 &= \frac{Max(High, \frac{1}{2}N..N)-Min(Low,\frac{1}{2}N..N)}{\frac{1}{2}N} \\ HL2 &= \frac{Max(High, \frac{1}{2}N)-Min(Low,\frac{1}{2}N)}{\frac{1}{2}N} \\ HL &= \frac{(Max(High,N) - Min(Low,N))}{N} \end{aligned}\]

The fractal dimension can be thought of a measure of volatility for a finencial price chart.

#Function to calculate fractal dimension as intruduced by John Ehlers
fractal_dimension <- function(data){
  
  data.half <- nrow(data)/2
  first.half <- 1:round(data.half)
  data1 <- data[first.half]
  data2 <- data[!first.half]
  
  hl1 <- (max(data1$Price) - min(data1$Price)) / data.half
  hl2 <- (max(data2$Price) - min(data2$Price)) / data.half
  hl <- (max(data$Price) - min(data$Price)) / nrow(data)
  
  D <- (log(hl1 + hl2) - log(hl)) / log(2)
  D
}

#Get for all ETFs the fractal dimension 'D'
fractal_dimensions <- sapply(etfs.red, function(x) fractal_dimension(x))

#First look at fractal dimensions
  #table(round(fractal_dimensions, 1))

Beta

“In finance, the beta (\(\beta\) or beta coefficient) of an investment indicates whether the investment is more or less volatile than the market as a whole.” Wiki

As a reference for the whole market I chose the S&P500 index. I collected the data from Yahoo-Finance.

sp500 <- fread("sp500.csv", sep = ",")
sp500[, Price := apply(sp500, 1, function(x) mean(as.numeric(c(x["High"], x["Low"]))))]
sp500 <- sp500[, c("Date", "Price")]
sp500$Date <- as.Date(sp500$Date)

get_log_return(sp500)
sp500$log_return[1] <- 0

The beta is then defined as: \[beta = \frac{Cov(r_a, r_b)}{Var(r_b)}\]

Where:
\(r_a =\) Log return of ETF
\(r_b =\) Log return of S&P500

sp500.var <- var(sp500$log_return[-1])

get_beta <- function(x, sp500.var){
  x$log_return[1] <- 0
  #Merge to get overlaping Dates
  a <- merge(x, sp500, by = "Date")
  x.cov <- cov(a$log_return.x, a$log_return.y)
  beta <- x.cov/sp500.var
  beta
}

betas <- sapply(etfs.red, function(x) get_beta(x, sp500.var))

Sharpe Ratio

“In finance, the Sharpe ratio (also known as the Sharpe index, the Sharpe measure, and the reward-to-variability ratio) is a way to examine the performance of an investment by adjusting for its risk. The ratio measures the excess return (or risk premium) per unit of deviation in an investment asset or a trading strategy, typically referred to as risk, named after William F. Sharpe.” Wiki

I calculated the Sharpe Ratio with the following formula: \[S = \frac{\bar{r_p} - r_f}{sd(r_p)}\]

Where:
\(r_p =\) ETF return
\(r_f =\) Risk free rate

A common value for the Risk free rate are the expected return of U.S. Treasury bills. Since they are very low and for an easy calculation I have set the Risk free rate to zero.

calc_sharpe_ratio <- function(x, abs = FALSE){
  if(abs == FALSE){
 sharpe <- mean(x$log_return[-1])/sd(x$log_return[-1])
 sharpe
} else {
    sharpe <- mean(x$abs_return[-1])/sd(x$abs_return[-1])
    sharpe
  }
}

sharpe_ratios <- sapply(etfs.red, function(x) calc_sharpe_ratio(x))
sharpe_ratios_abs <- sapply(etfs.red, function(x) calc_sharpe_ratio(x, abs = TRUE))

Volatility Comparisons

In the plot below the distribution for all used 1092 ETFs are shown.

volatility_measures <- data.table(etf = names(etfs.red), hist_vol, betas, sharpe_ratios, fractal_dimensions, overall_returns)

volatility_measures_hist <- melt(volatility_measures[, -c("etf", "overall_returns")], variable.name = "measure")
levels(volatility_measures_hist$measure) <- c("Historic Volatilitys", "Betas", "Sharpe Ratio", "Fractal Dimensions")
ggplot(volatility_measures_hist, aes(value)) + 
  geom_histogram(bins = 20, col = "black", fill = "lightgrey") + 
  facet_wrap(~measure, scales = "free") +
  labs(title = "Distrubution of volatility measurments") + 
  theme_minimal()

Measure Value Interpretation
Historic Volatility small Price moves slow
big Price moves fast
Beta < 1 Price moves slower than the market
1 Price moves with the market
> 1 Price moves faster than the market
Sharpe Ratio small High risk, low return
big Low risk, high return
Fractal Dimension 1 Simple line (0 volatility)
2 Area (maximal volatility)

As the first step for compaaring the volatility measures I looked at the correlation between there measures. Additionally to the volatility measures I tested if there are correlations with the overall returns of the ETFs and the volatility measures. As can be seen in the pairs plot below, there are no correlations between the volatlity measures. The strongest correlation is between the historic volatility and the Sharpe ratio. This should be not surprising, since I defined 0 as the risk free rate. Therefore the Sharpe ratio can be thought of as the mean log return, divided by the historic volatility.

pairs.panels(volatility_measures[, -c("etf")])

Another way of visualising the relation between the volatlity measures and th overall returns is shown below. The ETFs with strong overall returns seem to be somehow related to the volatility measures. This could be an interesting starting point to determine a value for any ETF based on its volatility measures.

g_legend <- function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)}

legend <- ggplot(volatility_measures, 
       aes(hist_vol, betas, size = overall_returns, color = overall_returns)) + 
  geom_point() +
  scale_color_continuous(low = "lightgrey", high = "black") + 
  guides(color= guide_legend(title="Log[Overall Return]"), size=guide_legend(title="Log[Overall Return]"))



legend <- g_legend(legend)

p1 <- ggplot(volatility_measures, 
       aes(hist_vol, betas, size = overall_returns, color = overall_returns)) + 
  geom_point(aes(alpha = 0.3), show.legend = F) +
  scale_colour_continuous(low = "lightgrey", high = "black") +
  xlab("HV") + 
  ylab("Beta") +
  theme_minimal() 


p2 <- ggplot(volatility_measures,
        aes(hist_vol, sharpe_ratios_abs, size = overall_returns, color = overall_returns)) +
  geom_point(aes(alpha = 0.3), show.legend = F) +
  scale_color_continuous(low = "lightgrey", high = "black") + 
  xlab("HV") + 
  ylab("Sharpe Ratio") +
  theme_minimal()

p3 <- ggplot(volatility_measures,
       aes(hist_vol, fractal_dimensions, size = overall_returns, color = overall_returns)) +
  geom_point(aes(alpha = 0.3), show.legend = F) +
  scale_color_continuous(low = "lightgrey", high = "black") + 
  xlab("HV") + 
  ylab("D") +
  theme_minimal()

p4 <- ggplot(volatility_measures,
       aes(betas, sharpe_ratios_abs, size = overall_returns, color = overall_returns)) +
  geom_point(aes(alpha = 0.3), show.legend = F) +
  scale_color_continuous(low = "lightgrey", high = "black") + 
  xlab("Beta") + 
  ylab("Sharpe Ratio") +
  theme_minimal()

p5 <- ggplot(volatility_measures,
       aes(betas, fractal_dimensions, size = overall_returns, color = overall_returns)) +
  geom_point(aes(alpha = 0.3), show.legend = F) +
  scale_color_continuous(low = "lightgrey", high = "black") + 
  xlab("Beta") + 
  ylab("D") +
  theme_minimal()

p6 <- ggplot(volatility_measures,
       aes(sharpe_ratios_abs, fractal_dimensions, size = overall_returns, color = overall_returns)) +
  geom_point(aes(alpha = 0.3), show.legend = F) +
  scale_color_continuous(low = "lightgrey", high = "black") +
  xlab("Sharpe Ratio") + 
  ylab("D") +
  theme_minimal()

layout <- rbind(c(1,NA,7),c(2,4,NA),c(3,5,6))

grid.arrange(p1, p2, p3, p4, p5, p6, legend, 
             layout_matrix = layout,
             top = "Volatility Measures of >1000 ETFs")

Examples for the most extreme Volatility values

In the following plots I compare the ETFs with the lowest and highest values for all the calculated volatility measures. Additionally to the Price chart I plotted the daily price fluctuation. This is simply the change of the Price from one day to the next.

price_fluc <- function(price.chart){
  next.price <- c(NA, price.chart[1:nrow(price.chart)-1]$Price)
  price.chart[, price_change := Price - next.price]
}

plot_price_chart <- function(etfs, variable, variable.name = "", max = TRUE){
  require(magrittr)
  require(plotly)
  require(data.table)
  
  if(max == TRUE){
  
    max.chart <- etfs[[which.max(variable)]]
    max_var <- variable[which.max(variable)]
    max.chart <- price_fluc(max.chart)
    
    p <- max.chart %>% plot_ly(x = ~Date, y = ~Price) %>% add_lines()

    p1 <- max.chart %>% 
      plot_ly(x = ~Date, y = ~price_change, type = "bar") %>%
      layout(yaxis = list(title = "Price Change"))

    subplot(p, p1, nrows = 2, shareX = T, titleY = T) %>% 
      layout(title=paste("Price chart of ",
                   names(max_var) ,
                   " (",
                   variable.name,
                   " = ",
                   round(max_var, 4), ")", 
                   sep = ""), showlegend = F)

  } else {
    
    min.chart <- etfs[[which.min(variable)]]
    min_var <- variable[which.min(variable)]
    min.chart <- price_fluc(min.chart)
    
    p <- min.chart %>% plot_ly(x = ~Date, y = ~Price) %>% add_lines()

    p1 <- min.chart %>% 
      plot_ly(x = ~Date, y = ~price_change, type = "bar") %>%
      layout(yaxis = list(title = "Price Change"))

    subplot(p, p1, nrows = 2, shareX = T, titleY = T) %>% 
      layout(title=paste("Price chart of ",
                   names(min_var) ,
                   " (",
                   variable.name,
                   " = ",
                   round(min_var, 4), ")", 
                   sep = ""), showlegend = F)
    
  }}

Historic Volatility

Note how the fxo.us ETF has on the 29 September 2008 a huge increase in its price followed by a immidiate decrease. This fluctuation is probably a result of the big market drops on that day.
CNBC on the 29 September 2008: “Investors are stunned and dump stocks frantically until the Dow ends 777 points lower, at 10365.45, its biggest one-day point drop ever. The S&P 500 also logs its biggest one-day point drop, falling 106.59, or 8.8 percent, to 1106.42. The Nasdaq has its biggest one-day point decline since 2000, falling 199.61, or 9.1 percent, to 1983.73.” Link

plot_price_chart(etfs.red, hist_vol, "HV", max = T)
plot_price_chart(etfs.red, hist_vol, "HV", max = F)

Fractal Dimonesion

In these to ollowing plots it is very striking, that on the 18 August 2016 there is a huge drop compared to all other fluctiations. CNBC on the ‘flash crash’ on the 25 August 2015: “There were 1,278 trading halts for 471 different ETFs and stocks. Because of this, it was not possible to calculate the value of many ETFs, or hedge or trade ETFs and stocks at a ‘correct’ price.” Link

plot_price_chart(etfs.red, fractal_dimensions, "D", max = T)
plot_price_chart(etfs.red, fractal_dimensions, "D", max = F)

Beta

plot_price_chart(etfs.red, betas, "Beta", max = T)
plot_price_chart(etfs.red, betas, "Beta", max = F)

Sharpe Ratio

plot_price_chart(etfs.red, sharpe_ratios, "Sharpe Ratio", max = T)
plot_price_chart(etfs.red, sharpe_ratios, "Sharpe Ratio", max = F)